Zomato Rating Prediction and Restaurant Recommender

Business Problem

With the onset of the pandemic in 2020, The way India eats has changed. The penetration of online food services in India is set to double by 2025. Restaurants will have to soon find a way to control their operating expenses such as rentals, electricity, and manpower costs. Cloud kitchen or delivery only outlets are a viable option for restaurants to explore now. This comes with its challenges; people now are ordering food based on reviews and ratings. Various factors affect the rating of a restaurant apart from quality and taste like the average cost of food, cuisines offered, restaurant type, location etc. new restaurants that have just started up might lack the necessary data to forecast how their restaurant is going to perform and they do trial and error to figure out what works to get good reviews, this takes up a lot of time and resources which many small restaurants can’t afford and thus they might not be able to compete with well-known established restaurants and this might result in restaurants failure. Also, new restaurants face a lot of questions starting from deciding on the location to choosing the type of cuisines to offer. This project helps solve that issue by analyzing the data from Zomato to predict a rating based on certain features and understand the taste of the Bengalureans thus helping restaurants succeed by making the right choices.

Variables & Descriptions:


Variables Description
url: contains the url of the restaurant in the zomato website
address: contains the address of the restaurant in Bangalore
name: contains the name of the restaurant in Bangalore
online_order: whether online ordering is available in the restaurant or not
book_table: table book option is available in the restaurant or not
rate: contains the overall rating of the restaurant out of 5
votes: contains total number of rating for the restaurant
phone: contains the phone number of the restaurant
location: contains the neighbourhood in which the restaurant is located
rest_type: restaurant type
dish_liked: dishes liked by people in the restaurant
cuisines: food styles offered
approx_cost(for two people): contains the approximate cost of meal for two people
reviews_list: list of tuples containing reviews for the restaurant, each tuple
menu_item: contains list of menus available in the restaurant
listed_in(city): contains the neighborhood in which the restaurant is listed

Importing Libraries

Reading Data

It can be seen from the above output that some features contain null values we need to treat those values

'dish_liked' has about 54% of missing values if we try to handle this missing value we will introduce bias. Hence we are dropping the column

url and phone does not add any value to the rating hence we will drop those columns as well

'rate' column has 15% missing values and also its is in string format which is not ideal. Hence we will convert rate into float format

We see that there are values like 'NEW', '-', and nan values they cannot be converted to float, we can't remove the data as they might be new restaurants without ratings so we will write a function accept them as exceptions and convert them to nan values

We have converted 'NEW' and '-' values into NaN values

EDA

Visualizing restaurant distribution by location

We see that 'BTM' has the highest number of restaurants followed by 'HSR' and 'Koramangala' all are know for their closeness to IT hubs and Outer ringroad. we can say IT industry has been the driving force of restaurant industry in Benagaluru

Restaurant distibution by type of service offered

We can see that majority restaurants are supporting food delivery, which is an indication of growing online trend

Restaurant distribution by subcategories

'Quick-bites' category has the majority share, which indicates growing fast food culture in the city

How many of these restaurants accept online orders ?

We can see almost 58% of restaurants are accepting online orders. There is still around 40% restaurants that have not yet adopted online delivery service. So there is still potential for online delivery service service companies to penetrate the market

How many of these restaurants offer dine - in service ?

We can see 87.5% of the restaurants offer Dine in, it can be conluded by comparing above two charts that majority restaurants are offering both online and dine in service. and only about 12.5% of restaurant are online service. They may represent cloud kitchens.

We can see that there is a huge market opportunity in cloud kitchen service still untapped. and competion is not so much in this segment yet

Top 10 restaurant chains based on number of outlets

'Cafe coffee day' has the highest number of outlets followed by 'Onesta' and 'Just Bake', this shows the love for Bengalureans towards coffee, pizza and deserts

Top restaurants in quick-bite category

Looks like 'five star chicken' is leading the quick-bites category followed by "Domino's" and "McDonald's"

looks like multi national food service gaints are leading this category. Domino's which is an indian company stands second in the segment

Top restuarants in casual dining category

'Empire Restaurant', 'Beijing Bites', 'Mani's Dum Biryani' are the top 3 restaurants in casual dining category. this shows the Bengalureans love towards biriyani and chinese cusinies

Top 10 restuarants based on rating

There seems to be a error in name of the restaurant, lets try to correct it

Looks like original name is 'Santa Spa Cuisine'

10 Lowest rated Restaurant Chains

Distribution of rating

From the visualization we can infer 3.9 is most common rating

Distribution of cost for 2 people

From the above plot we can conclude it is going to cost an approximate 500/- INR for two people in majority of the restaurants in Bengaluru

Best budget friendly restaurants

Best restaurants for fine dining

Top rated restaurants

Finest taste on budget

Top high rated restaurants when budget is not a issue

Locations where you can find best restaurants when on budget

Locations where you can find best restaurants on the expensive side

Lets Visualize the distribution of restaurants in Bengaluru

We can see that most restaurants are concentrated on the south eastern part of the Bengaluru. Also, there are quite a lot of areas around the central Bengaluru where restaurant concentration are not that high, these might be potential locations for new restaurants looking to capture the market without facing a lot of competition

Sentiment Analysis

The data reveals most restaurants are having a positive rating which is a good news for the restaurant industry

Data transformation for model building

Let's split the data into training and test set. We will be using 80% of data for training and 20% of data as test set

Model building

Linear Regression

The model accuracy for linear regression model is very low. Hyperparameter Tuning techniques like Lasso and Ridge may not improve the model performance drastically. Hence, lets check out other model and decide on the hyper parameter tuning

Decision Tree

The tree based model is performing well in terms of R squared score. lets us use an ensemble learning to boost the performance further

Random Forest

Random Forest Regressor is performing better than decision tree regressor. lets check out other models as well

Gradient Boosting

Performance of gradient boosting algorithm has degraded a little compared to random forest regressor. lets checkout Xtreme gradient boosting.

XG boost

all though XG boost regressor performed well compared to gradient boosting its score is still lesser than random forest regressor

Tabulating the model results

As we can see Random Forest Regressor seem to be the best performing model. we will save this model for making further predictions

Model Prediction

Saving the model for deployment

Recommender system

Search for a restaurant by name

Get recommendation of similar restaurants restaurants

we have got a recommendation of top 10 similar restaurants based on their cosine score

Conclusion

In this project, an attempt has been made to understand the restaurant industry in Bengaluru and predict the rating of a restaurant with 98.9% accuracy using a random forest algorithm based on location, availability of online order service, availability of dine-in service, type of cuisines offered, type of service offered, number of votes received, the sentiment of the reviews. Also, a recommender was built based on NLP to suggest similar restaurants for the customers. This will empower restaurants with the data required to take decisions to make their business successful. Future possible research could make use of other significant factors which includes the foot traffic competition i.e the number of similar businesses that could impact the new business being established, accessibility, and average business rates that could be incurred for a particular type of restaurant. These above-mentioned factors could help the system make the analysis more accurate